DOCUMENT RESUME 

ED 362 562 TM 020 617 



AUTHOR 
TITLE 



INSTITUTION 



SPONS AGENCY 

PUB DATE 
CONTRACT 
NOTE 

PUB TYPE 



Dietel, Ron 

What Works in Perfortnance Assessment? Proceedings of 
the CRESST Conference (Los Angeles, California, 
September 10-12, 1992). Evaluation Comment. 
California Univ., Los Angeles. Center for the Study 
of Evaluation.; Center for Research on Evaluation, 
Standards, and Student Testing, Los Angeles, CA. 
Office of Educational Research and Improvement (ED), 
Washington, DC. 
93 

R117G10027 
25p, 

Collected Works - Conference Proceedings (021) — 
Reports - Evaluative/Feasibility (142) 



EDRS PRICE MFOl/PCOl Plus Postage. 

DESCRIPTORS Cost Estimates; -'Educational Assessment; Educational 

Research; Elementary Secondary Education; Evaluation 
Criteria; General izat ion; 'Ter f ormance ; '''Portfolios 
(Background Materials) ; 'Tsychometrics : Rel iabil ity: 
"Student Evaluation; Validity 

IDENTIFIERS ''Alternative Assessment; Center for Research on Eval 

Standards Stu Test CA; ''Performance Based 
Evaluat ion 



ABSTRACT 

The 1992 annual conference of the Center for Research 
on E^valuation, Standards, and Student Testing (CRESST) was dedicated 
to explaining **What Works in Performance Assessment?" This report 
provides a synopsis of discussions by over 300 policymakers, 
researchers, and teachers. CRESST Co-Director Eva Baker summed up the 
present state of performance assessment in her opening remarks. Other 
researchers concurred with her warning that there is much that is not 
yet known about performance assessment. A beginning has been made, in 
that what is known about standardized tests and the misuse of their 
results has been defined. It is evident that state and federal 
interest in the use of performance assessments is growing. Research 
is being conducted into performance assessment fairness, particularly 
with regard to portfolios, '^he expected links among instruction, 
learning, and alternative assessment are being investigated. The 
CRESST validity criteria of transfer and general izabi 1 i ty have 
received substantial attention in studies of the technical aspects of 
performance assessment. The final validity criterion that CRESST 
requires of performance assessment focuses on cost and the resources 
needed to implement the new assessments in the classroom. What is 
known above all is that lots of performance assessments are being 
developed. Brief abstracts are given of 13 additional conference 
papers. An attachment lists and annotates the technical reports 
available from CRESST. (SLD) 
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Peril >!!n.iiKc .>^^ ^^'lU'*'.: re- 
search appears to be 
caught in a law of supply 
and demand — plenn* of demand 
from impatient customers who 
want to know if performance as- 
sessment actually "works/* but 
too little "supply*" in the form of 
answers from the research com- 
munit\* about what works and 
what doesnX. 

The 1992 annual CRHSST as- 
sessment conference was dedi- 
cated to explaining ''What Works 
in Pertbrmance Assessment/' 
From September 10-12, 1992 
over 300 policymakers, research- 
ers, and teachers met on the 



campus to discuss what 
educators currently know about 
these new t\pes of tests. 

CRESST Co-director Eva Baker 
summed up the present state of 
performance assessment affairs in 
her opening conference remarks. 

'The policy and practitioner 
communities arc acting/' 
warned Baker, "with or with- 
out us. We no longer have 
the luxur\- of saying we don't 
have the answers yet but if 
youMI just hold on for four or 
five more years, our research 
will really be able to tell you 
what to do.*' 



^ Special thanks to foan Herman and Katharine Fry- for their valuable sttjincstions to this 
article. V)an Its also to the many presenters and discussants who shared their research at the 
1 992 CRESST conference. 



Baker cautioned against expect- 
ing too much from new assess- 
ment methods. 

"^One nf the things that wor- 
ries many of us is the enor- 
mous h\pe that's been asso- 
ciated with performance as- 
sessment/' said Haker. "It's 
better than superman, better 
than chocolate pecan pie, or 
the fastest, lightest computer 
notebook.*' 

Other researchers concurred 
\sith Baker, including CRESST 
C^o-director Robert I .inn. I .inn 
indicated that there is a lot more 
we dtm Vit«/;ip about performance 
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assessment than ue tin kumw but 
that a contercnce UKiisinuDn w hat 
we have learned about pcrtor 
niai^ce assessment u ould eonirib • 
uie u> the expandinu base of per 
tormance assessment know ledge. 

Linn suggested that a frame- 
work tor the rvxo day meeting 
might m»Mudethe CRKSST valid 
itv critena tor performance assess- 
ment. Published in a 1^91 
C^RHSST technical report.-^ the 
ClRKSSl criteria include. 

• conscqu:nces ot pertor 
mance assessment; 

• equit\ , including test fair- 
ness aiid opportunities tor 
students to learn assessed 
knowledge skills; 

• transfer and generali/.abil- 
it>' theor\'; 

• content and curriculum 
qualin* including cogni- 
tive complexity, content 
qualin\ and content cov- 
erage; 

• meaningfulness ofperfor- 
mancc assessments; and 

• costs and efficiencies. 

- CSr/CRESST Icchntcal Report .1^1 



A BeLnniung: V^'ruV- < 
>cem Ivjiou AnoiT' 
Manuardi/CL; l esi 

Test Scores Vases Peiformance 
Although disagreeing on sev 
eral points, many conference pre- 
senters agreed that there are seri- 
ous problems with traditional 
standardized tests. 

hi her research with disadvan- 
taged children, for example, Lily 
Wong Fillmore from the L'niver- 
sirv of California. Rerkelev, t'ound 
a troubling discrepancv bet\veen 
results<)fstandardized CTBS read • 
ing comprehension tests and ac- 
tual student performance. 

"We were doing a perfor- 
mance assessment of lan- 
guage, of cultural adaptation 
to school, of how students 
were dealing with the prob- 
lem of learning a language 
they did not know." said 
Wong Fillmore. "When we 
compared the (CTBSj test 
scores tc^ what u e w ere [per- 
formance] measuring, we 
found vast dirterences. With 
some (^f these kids* the\- were 
so free of English, that is, 
thev would not speak a word 
of it — yet they did well in the 
CTRS reading comprehen- 
sion test. Other students that 
we knew to be performing 



quite well." she added, "did 
ver\' poorK on the C I BS 

This lackof correlation bet^^een 
standardized test scores and mean 
ingful student performance ha^ 
been noted by (Jthers Standard 
ized tests tvpicallv measure basu 
concepts and proce^^Uuvs. said 
Thomas Romberg, from the I'm 
versirsofAVisconsin - Madis( m . but 
not in -depth undersiandmg oi 
student production ol'kiu iw ledge 

^AVhat the\ [standardized 
tests] measure. the\ measure 
well." noted Romberg, "but 
what they do not measure is 
of concern also. The mam 
issue is to let people know 
that if these [standardized 
tests ] are the instruments they 
are using, then basic concepts 
and procedures are the math- 
ematics they are assessing, 
which is a small part of know- 
ing and being able to use 
mathematics." 

Improper Use of Standardized 
Tests Results 

Other conference presenters 
had less negative views of stan- 
dardized tests. H.D. Hoover, 
L'niversit>'of Iowa, suggested that 
the problem isn't with standard- 
ized tests themselves but the im- 
proper use of such assessments. 
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"There IS no doiibr/* said 
Ho()\ en *'thc w ay ( standard - 
i/ed] rests h»ivc been used 
has sometimes been iiornble . 
I think this mostlv has been 
broil tcht abt)iit hv mandated 
state assessment ,\^rourams 
and iisinu iliesc tests \ov ac 
countabiliiv in liiiili stakes 
sitiiatu )ns tor u Inch the\ \\ ere 
never iiueiidcd.'* 

Ilnwexer. despite relativelv 
widespread aureement about the 
pn )bicins ( )t star*dardi / cd tests and 
the utilization of their resuhs, 
poUcvmakers and the pubhc will 
hkelv continue to use deciininu 
standardi/ced test scores as a raih • 
ini: cr\' tor what's wronu uith 
American education. And despite 
evidence that high stakes account - 
abiUt}' "uses" corrupt the testini: 
process* pohcymakers continue to 
have u;reat taith in the power ot' 
assessment. 

U'hat ^^^ About 
iVTtormance Assessineni 

State Interest in Assessment 

State polk^maker interest in 
assessment is iirowuiii, suggested 
Lorraine McDonnell from the 
I'niversitv ot C alitornuu "^aiita 
Rarbara, because manv p(*lic\- 
makers view assessment as a lever 



of change. As pMrt of a C IIHSST 
project* McDonnell is conduct- 
ing an extensive investigation of 
the mt)ve towards new forms nt 
assessment in Kentuckv, Califor- 
nia, Indiana, and Nt>rth C\irolma- 
She has tuund that state policv- 
makers t'requeiuh support assess- 
ments tor \er\dit^'ereiu pun^oscs. 

In C'alifurnia, said McDonnelh 
the new pertormance-based C\ili- 
tornia 1 earning Assessment Sys- 
tem I CLAS ) came about because 
ot' a rare consensus among the 
three s,ate centers ot' educatuHi 
power: the governor* the legisla- 
ture* and the state school super- 
intendent. But each center had its 
o\\ n reast)ns tor wanting new as- 
sessments. 

"Ciovernor Wilson would like 
to move to a svstem of merit- 
pav where teachers with high 
scoring students are re- 
warded*" reported McDon- 
nell. ''One of the governor's 
aides told us: 'We could care 
less about authentic assess- 
ment — it costs more money 
and we don*t know if it*s any 
belter. For us* having indi 
\idual student scores is really 
powerful. It brings *Kcount- 
abilitA' into a system where it 
isn't there now. Parents can 
then say they don *t want their 



child in .\ls. SmithN class- 
room because they will ha\ e 
the I necessary ) assessment 
information.'" 

Ot^LTing .1 \ er\' different reason 
for the same new assessments was 
the I aliforma si*ue legislature* 
prim an Iv Senator dan' Hart,ch*iir 
of the Senate Hducation Com- 
mittee. Accnrdingto McDonnell* 
Hart agreed to exchange '*gre*uer 
accountability in order to get 
greater autonomy tor instruction 
and school operation. It was quid - 
pro -quo for having schools move 
to site -based m*in*igemcnt*" said 
.McDonnell. 

The final policymaker in the 
game. Bill Honig* the former 
Clalitornia Superintendent of Pub 
lie Instruction, w*is "interested in 
assessments that are more con- 
gruent w ith the type of curricu- 
lum he espouses*" *idded McDon- 
nell* "assessments that measure 
real -world performance and will 
intluence teaching/* 

The lesson * concl uded Me Don - 
nelU is that test developers, 
schools* districts, researchers* and 
practitioners* will have to accom- 
modate multiple and sometimes 
competing policymaker purposes 
that drive per form ance assessment 
development* implementation 
and use. 

.. .J 
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Federal Inurcst Grows 

Meanwhile, the federal govern 
mcnt has been looking at assess- 
ment as a lever for national educa- 
tional reform. Several CRESbf 
presenters suggested that niomen - 
turn for national standards and 
national performance tests is rap- 
idly growing. 

"Improxing assessment be- 
came an issue ( iii recent years] 
not to improve assessment 
but because of the woeful 
state of education reported 
throughout the country/' 
said Andrew Hartman, edu- 
cation policy coordinator for 
the Republican staft* of edu- 
cation committee. 

Agreeing with Hartman was 
Michael Fcucr, Office of Tech- 
nology' Assessment. 

"The spirit in Washington has 
been tense," said Feuer. "Na- 
tional standards and national 
curriculum this last year have 
been prominent concepts in 
Washington and we seemed 
(at one point] to be mo\ing 
towards national testing." 

National standards arc a likely 
realit}-, added Feuer. Yet fears 
about a single national test may 
lead to different assessments de- 




veloped by individual states or 
clusters of states, with its own set 
of possible negative consequences. 
Policymakers, parents, the pub- 
lic, employers — ever\-onc — will 
want to compare these different 
performance assessments to one 
another, despite warnings trom 
the research communit)' that pre- 
cise comparisons may be techni- 
cally impossible. 

"Comparing results trom dif- 
ferent r\pes of high stakes 
assessments is questionable 
under most situations,'' said 
CRESST Co-director Bob 
Lirm, **unless the assessments 
have been developed from 
ver\' similar standards." 

Furthermore, invalid compari- 
sons may result in incorrect deci- 
sions about school or teacher per- 
formance, or worse yet, incorrect 
decisions about students 

What U'c Know About 
PcrtDriiiaiice Assessiiieiit and 
Fairness 

More Issues Than Solutions 

A second CRESST validit\' cri- 
terion, fairness, was addressed by 
several conference presenters who 
agreed that ensuring fairness for 
students who take performance 

5 



assessments is at least as difficult 
as ensuring fairness for students 
whotake st.Midardized tests. The\- 
noted that the impact of new as- 
sessments on disadvantaged chil- 
dren could be severe if student 
opportunities to learn remain 
unequal. 

"If we put forth the worst- 
case scenario for African - 
American and Latino chil- 
dren,'' said CRESST re- 
searcher Linda Wintleld, 
"where the instructional con- 
ditions are marginal, facilities 
are poor, where the actual 
assessment might be based 
on exercises or content that is 
totallvforeignorthe language 
is foreign, where the raters 
might be biased — then we 
have a situation where per- 
formance-based measures will 
be much worse than tradi- 
tional measures in the form 
of standardized tests." 

The language dependena' of 
many alternative assessments is 
also troubling according to sev- 
eral conference participants, in- 
cluding CRESST Associate Di- 
rector loan Herman. 

"How do we separate lan- 
guage proficiency from con- 
tent knowledge and thinking 
skills?" asked Herman. ^The 
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problem is particulariv acute 
for non-native speakers, \\ lio 
arc disadvantaged by the 
manv assessments requiring 
\orbal tbiency." 

I Wonu Fillmore also noted 
the reverse pnUMem. She suu:- 
gesredihat limited English speak- 
me Asian students niav under- 
stand less than what we assume 
and that such assumptions are 
cause tor concern. 

"Uliether an assessment im 
dcrcstimates or overestimates 
a student's compete nee, "said 
Wong Fillmore, "there are 
senoiis equit\* issues. It" a test, 
whatever sort, favors indi- 
viduals who are in tact not 
doing as well as you think, 
then w hat you get is a kind ot" 
nt'nlt'ct o\ the educational 
needs of those kids." 

Wong Fillmore suggested that 
adequate attention and resources 
must be devoted to improvu.g 
the skills ot'all children, regardless 
of cultural background. 

Fairness in Ponf olios 

Research conducted in Pitts- 
burgh by Paul l.cMahieu, I'ni 
v ersitvot* Delaw are and the Dela- 
ware Department of Public hi 
struction, indicates that serious 
attention must be paid to whether 



or not porttolit^s are equitable tor 
all students, nunng his presenta 
tion l.eMahieu explained one ot 
his portfolio studies: 

"We examined t^v() groups of 
students." said l.cMahieu, 
"for whom we had access to 
both the tull .nidies of their 
work as w ell as the porttbiii)s 
that resulted tVoiiitheirselec- 
tions of work troni th.at whole. 
One group scored higher on 
the portfolio selections. This 
group was \c up of high 
achievers who were also pre- 
dominantly white. The sec- 
ond group w as a low achiev- 
ing group, made up primarily 
of minority students. Their 
portfolios were rated lower 
than the full body of their 
work." 

lA'Mahieu believes that the 
low er portfolio scores of the sec- 
ond group may have resulted be- 
cause this group did not deeply 
understand the purposes of the 
portfolios, the standards against 
which their portfolio work would 
be measured, or that tiie students 
did not have the selt'-retlection 
skills necessarx' to assemble higher 
qualirv portfolios, ones that pre 
seined themselves more faith - 
tullv. 



".\pparentlv the tirst group 
understood how their work 
would be ludged," said 
l.eMahieu, "and how to 
present themselves w ell with 
respect to the evaluative cn- 
lena in use. Moreover, they 
knew how to examine their 
work with a critical eve in 
compiling the portfohos that 
would represent them. The 
second group did not have 
access to these understand- 
ings. C^bviously such knowl- 
edge and 'ikills need to be the 
object of explicit instructions 
in order to avoid this poten- 
tial source of bias." 

Denme Palmer Wolf tVoiii 
Project PAC!H, suggested that 
l.eMahieu's research highlights 
the importance of instructional 
and assessment equitx* for porttb- 
lio use in the classroom. l o com- 
pete on a level playing tield, all 
students mu.st have deep under- 
standings of portfolio purposes, 
standards, and processes. 

"There are all kinds of things 
about putting together your 
portfolio wliieh we may be 
teaching tosome students and 
not teaching to others," said 
Palmer Wolt. ^'We haw a deep 
responsibilitx* to think about 
these kinds of cquit\' issues."' 



ERLC 



6 



What Works in Performance Assessment? 
The 1992 CRESST Coni eiienci 



Ultifuatc Responsibility for 
Unproved Education 

Although cquit\--scnsiti\ c tests 
may significantly contribute to 
fairness in assessment, they can- 
not by thcmscU'cs adequately solve 
the multitude of equity* problems 
facing education today, 

'^Improved assessment can 
contribute to improved edu- 
cation," said CRESST con- 
sultant Edmund \V. Gordon. 
Cit>* Universin ofNew York. 
*'but in the final analysis it is 
those of us who are respon- 
sible for teaching and learn- 
ing that must ensure that ad- 
equate and equitable teach- 
ing occurs if eft'cctive learn- 
ing is to be the result." 

What \Vc Kn<n\ About Thi 

Eroniiscd Link Between 
Instruction, LcarninLi, aiui 
A i t em a 1 1 vc Assess 111 e n : 

Cognitive theory- provides a 
valuable fi*amcwork for integrat- 
ing performance assessments into 
the classroom. During the con* 
fercncc CRESST researcher Rob- 
ert Glaser recommended: 

^'Learning, instruction, and 
[ performance ] assessment 
should be one piece, a system 
of mutually interacting as- 



pccts of teachmi: This s\s 
tern should be driven bv xhv 
cognitive structures that arc 
acquired by students as they 
achieve knowledge and skill 
in a subject matter." 

Characteristics of knowledge 
topical of achieving students can 
be identified, explained Glaser, 
including how students structure, 
proceduralize and self-regulate 
know-ledge for etl"ccti\'e use . Glaser 
pomted out that some students 
jump into a problem or task with ■ 
out analyzing the nature of die 
problem, while higher achieving 
students tbrm a model of the situ- 
ation that enables them to gener- 
ate possible approaches and select 
among various alternatives. Per* 
formance assessments should be 
capable of measuring such knowl 
edge development processes, rec - 
ommendc'd Glaser. 

Role of Teachers in Assessment 

Glaser added that teachers plav 
the pivotal role in this knowledge 
acquisition process. Other 
CRESST presenters echoed his 
feelings that teachers know their 
students better than just about 
anyone else. 

"Human teachers," said 
CRESST researcher Richard 
Snow fi-om Stanford Univer- 
sitv, ''are perhaps the most 



sensiiiN e .issessmciii device 
available Uk looking at stu- 
dent nunivaiional and voli- 
tional behavior. Teachers can 
sec It and sense it — it [stu- 
dent pertbrniancc] is not al 
ways verbal." 

Tcjchers must be actively in- 
volved 111 the entire assessment 
process if learning, instruction, 
and assessment tjre to become 
integrated, motivational factors 
in the classroom, said lackie 
C'lieonc from the Universir\- of 
Calitbrnia, Davis. Chcong said 
that portfolios, such as the Cali- 
fornia Learning Record, enable 
teachers to understand key stu- 
dent learning processes. Integrat- 
ing instruction and assessment, 
the California Learning Record is 
a portfolio assessment in which 
students' efforts are documented 
through structured obser\-ation 
by teachers. 

Poitfolios Integrate Learning, 
Insti'HCtion, and Assessment 

A chief proponent of the value 
of portfolios in the learning, in- 
struction and assessment process, 
Dennie Palmer Wolf has found 
that portfolios work best when 
they artord links across disciplines 
and are concerned both wiih high 
standards and with development. 
Teachers and schools should 
maintain portfolios on students 
for a period of years, said Palmer 
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Wdlt. IK It jiisi tor a tl'w luoiuhs or 
a single school \ car. Pcrtonnaiicc 
tasks, whatever their nature, nuist 
be embedded into the curricu- 
lum. 

''Siudenis nuisi ha\ c lime lo 
think, to selt-CNakiaie about 
their work," explained Taimer 
Wolf. The |porit<)lio| tech- 
nology IS about doing cumu- 
lative work and ha\ing time 
in the school, in the curricu- 
lum . to reflect on that work/' 

rainier \Volt\ research from 
Arts Propel in Pittsburgh and 
IVoicJt r.\C 'K in tour urban dis 
iricts provides other "what we 
know" lessons that contribute 
the role of port folios ui the learn 
ing, instruction and assessment 
process: 

• Standards of pert'or 
mance should be made 
known to students as part 
t>f an overall school sys- 
tem that supports port- 
tolio assessment. 

• Portfolios must create a 
conversation between all 
the teachers in ihescho( )l. 
Students should use port 
folios to actually think, 
not merely record mt(>r- 
maiion. 



• Portfolios should tiinc- 
tion as examples ot' siu 
dent work that must be 
met prior to students' 
movement from one 
grade level to another. 
Porttnliossliould not li\ c 
and die in the middk* or 
elementar\* school: they 
must be maintained and 
passed nn to the next 
higher institution ot'edu- 
cation. They should act 
as critical ''passports'' to 
the best educational op- 
tions a student can locate 
and tr\*. 



What We Know .\bt)ui the 
Technical Aspects of 
Performance Assessment 

The technical portion of the 
C!RHSST conference focused on 
what researchers have learned 
about task development, scoring, 
comparability-, and moderation of 
pertbrmance assessment. As noted 
bv several presenters, the C!RHSST 
validitv criteria of transfer and 
generali?:abilit\* have received sub- 
stantial early attention and results 
are now becoming known. 

What We Know About Task 
Development and Scorinjj 

Manv states or consortia of states 
or counties have embarked on an 



ambitious efVort tu de\ clop per- 
formance assessment tasks. How- 
ever, with few models to base 
their designs upon, developers 
have found this enterprise tbrnii- 
da ble and slow -going. Mar\*land 
IS one state that has formed a 
succcsstvil ci)nsi>riiuin ot'couniies 
w Inch have pooled their resources 
:<) develop a performance svstem 
emphasizing thoughtful mastery 
of important tasks. But ihedevel- 
<)pment process has been prob- 
lematic. 

"When <ine sees examples of 
finely crafted performance 
I asks, ihev look easy, but thev 
are not," remarked lay 
McTighe from the Marxland 
.■\ssessmeni Consortium. 
"We found that task develop 
mcnt is a long-term process 
and extraordinarily difliculi 
work." 

Other assessment developers 
h a \'e e n c o u n t e re d si mi lar hu rdl e s . 
Lee I ones, w ho has been develop- 
ing a new hands-on science per- 
formance assessment for the Na- 
tional Assessment of Educational 
Progress (NAHP), agreed with 
Mc Tighe, noting that "the key 
thing we have learned is that this 
w hole I development 1 process 
takes lime , time, and more time." 

Nevertheless, performance 
tasks are being developed and 
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some valuable lessons are arising 
from the process, includi ni^ meth- 
ods tbr reducing; costs and re- 
source needs. Eva Baker, aloni: 
with other CRESST researchers, 
has developed a performance as- 
sessment model that appears ap- 
plicable across a variety- of subject 
areas and for a variety' of topics. 
The result is a relatively cost-ef 
fective method for devclopmc^ and 
scoring performance assessments. 

CRESST Perfor^nancc 
Assessment Model 

Baker's model focuses on ex- 
planation skills. Originally applied 
in social studies/history, the 
model has been used to measure 
depth of understanding in sci 
ence, mathemaucs, and geogra- 
phy. In the case of social studies/ 
histor\\ the assessment asks stu- 
dents to write extended essays on 
significant historical events, mak- 
ing use of prior know ledge and 
source materials such as the Lin- 
coln-Douglas debates. 

Some valuable hat we know 
lessons have occurred from the 
research. The scoring system, for 
example, based on a comparison 
of expert and novice perfor 
mances, showed that novices do 
not bring in external information 
(prior knowledge) to the assess- 
ment whereas experts do. Sec- 
ondly, novices make some big 
mistakes — they misunderstand 



context and write in a \er\' tlai 
wa\'. 

"In an attempt to be e\ 
tremely comprehensive/' said 
Baker, "the\' (novices) arc 
ver}' afraid to leave anything 
out. Expeas on the other 
hand, write explanations that 
are ver^' principle -oriented '" 

The analyses pnnided the s^oi 
ing dimensions for the assessment 
general impression ol content 
quality', prior knowledge, priii 
ciples or concepts, text detail, 
misconceptions, ,ind argumenta 
rion. This general strategy of bas- 
ing performance assessment scor 
ing rubricson differences bervveen 
expert and novice perfornMnces 
shows promise for other assess 
ments. 

CRESST researchers also 
learned that by developing task 
specifications, blueprints for par- 
allel tasks, they were able to re 
duce the number of tasks neces- 
sar\' to get relatively high reliabil- 
ity rarings. This important find- 
ing suggests that performance as- 
sessments may not require nearly 
the large number of tasks as sug- 
gested by other investigators. 

What We Know About 
Group Assessment 

Several states, such as Connecti - 
cut and California, are attempt- 



ing to incorporate group assess- 
ment into their large-scale testing 
programs. One intention of such 
efforts is to use secures from group 
assessments as indicators of indi- 
vidual performance. However, a 
kev technical question for such 
assessments is '*To what extent do 
scores on a group assessment ac- 
tually represent individual pertbr- 
mance or knowledge?" A study 
bv I'Cl A professor and CRESST 
researcher Norecn Webb sheds 
some light on this substantial tech 
nical question. 

Webb gave rwo seventh-grade 
classes an initial mathematics test 
as a group assessment, where ex- 
change of information and assis- 
tance was common. Several weeks 
later, she administered a nearly 
identical individual test to the same 
students where assistance was not 
permitted. 

The results showed that some 
students' pet*t*ormance dropped 
significandy from the group as- 
sessment to the individual test. 
These students apparently de- 
pended on the resources of the 
group in order to get correct an- 
swers and when the same resources 
were not available during the in- 
dividual test, many of the stu- 
dents were not able to solve the 
problems. Webb concluded: 

"Scores from a group assess- 
ment may not be valid indica- 
tors of some students' indi- 
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vidual competence. Further- 
more* achievement scores 
trom group assessment con- 
texts provide httle informa- 
tion about group tiinction- 
ing.*' 

\Vebb*s studv suggests that 
states or school districts who in- 
tend to assign individual scores 
based on group assessments may 
want to seriously rethink their 
intentions. 

Gcneralizability of Tasks and 
Assessn^ent Methods 

The number of tasks needed to 
ensure that assessments are rcli* 
able measures of student perfor- 
mance is one area where research 
has provided important results. 
Reported at the conference was 
valuable research on science per- 
formance tasks conducted by re- 
searchers at the Universirv' of Cali - 
fornia, Santa Barbara, including 
c:RHSST researchers Richard 
Shavelson and Gail Baxter,-^"* 

LcX)king for ways to reduce costs 
and task administration time, 
Shavelson, Baxter, and others 
designed a computer simulation 
of a hands-on performance task 
that was as close as possible to 
actual observations of student 
work. The researchers did ever\-- 
thing they could to make the uvo 
methods — observations and 

(fill/ Baxter is mw an assistant professor 
at the Vmvernty ofMichijian. 



computer simulations — conipa 
rable. But the results showed oniv 
a moderate correlation between 
the methods, even though thcv 
were painstakingly conceived, 
developed and administered. 

**This one (the i\m) tasks) 
blew us away," said Shave 1 
son. "There's actually a kid 
who got a [scc;re ot] one on 
the computersimulation task 
and a six v/hen we obser\cd 
his performance, ^^nd there *s 
anoti;cr kid who got a one 
when we obscr\ ed hispertbr- 
mance but got a six on the 
computer [task]. UTiat this 
means," says Shavelson, "is 
that you get a difterent pic- 
ture of kids* pefformantc 
trom two methods of mea- 
suring pertbrmance.'' 

L'CSB researchers also com- 
pared some shorv-answer re- 
sponses and multiple-choice re- 
sults, based on the same science 
tasks, to the computer simulation 
and obser\ ations of student per- 
formance . The only two tasks that 
appeared to be reasonably inter- 
changeable were methods of di- 
rect obser\'ation and use of a note- 
book. The notebook required stu - 
dents to conduct the experiment 
and then report their procedures 
and results in a specific format. 
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'*The moral of the stor\' is 
that most [ pertbrmaiicc as- 
sessment] methods are not 
interchangeable,'" concluded 
Shavelson. 

rhis finding has key policy- 
making and cost implications: 
When attemptmg to make hijjh- 
stakes decisions based on results 
I'rom dirtcrent assessments, many 
tasks and t\pes of assessments may 
be needed in order to make valid 
generalizations of student pertor- 
mance. 

What We Know About Compar- 
ing Peijvnnance Assessments 

As pre\iously mentioned, one 
proposal for a national assessment 
svstem would have clusters of 
states developing performance 
assessments matched to a national 
set of standards. CRESST Co- 
director Robert Linn, however, 
has strong questions about 'MP* 
and "how * the assessment com- 
munit\* can make valid compari- 
sons between diftcrent assess- 
ments developed in this manner. 
How will we know, for example, 
that these assessments arc mea- 
suring the same thing? .-Vnd based 
on the CRESST criteria, how will 
\\ c know that the assessments are 
comparable in terms of their cog- 
nitive complexit\', content qual- 
\t\\ and content coverage? 
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Durinc his CTIHSST prcsciito 
tion, Linii sugeestcd that .isscss 
mcnt comparabihtv is one area 
with a plethora ot unrcsolx cd is 
sues including ditVcrcnccs in taskv 
and administration conditions. 

*\\dminisiTation of ilittcreni 
assessments is just one part of 
this larger task comparahilits 
problem/' said Linn. "Honn 
do we account tor where, 
wiicn and how long ditk'reni 
pertormance tasks are adnun 
istered and what instruct mnal 
preparation childi en iia\ e hail 
prutr to lakn^g the lest"' 

lann said that task coniparabil 
itv must be consideieJ m lelauon 
to t\\()n'/>t*Jot students students 
who have never taken a pert(M 
mance test and students w ho iia\ e 
pertormance tests regularlv eni 
bedded into their curriculum 
"This issue has iinportant equitv 
and opportunity to learn implica- 
tionb/* added Linn. 

Technical Lessons from the United 
Kingdom — Moderation 

The United Kingdom is con- 
sidered well ahead of the L'nited 
States in the development, use, 
and scoring of performance as- 
sessments. Many c^fthe compara- 
bilitv issues mentioned by Rob 
Linn are ones that the L'.K. edu 



catioual sN'stem has had to ad- 
dress C'RHSS'l presenter 
Desmond Nuttall from the L'ni- 
\ersit\ ol London explained the 
comparabihtv problems the Brit 
ish iia\ c encvumtered where per- 
tormance assessments have been 
tied to national standards. 

'\\ major issue for us," said 
Nuttall. "IS whether a grade 
A III Sc)uthampton has the 
same meaning, the same util 
u\. the s.inie standard, as it 
ii<>es in Newcastle. We also 
ha\ e to tace the issues of com - 
parabili IV over time and com- 
parabihtv over dificrent sub- 
jects/' he added. 

Nuttall noted at least one other 
technical problem that may have 
implications in the L'nited States. 
Cirade intlation. 

"hi 198S we introduced a 
new examination based on 
performance assessment/' 
said Nuttall, "and in the years 
since, we've seen grades im- 
prove dramatically. In 1988, 
s(mie 42% of the students 
achieved a grade C' or better 
in the examination. This year 
the tigure has risen tWmi 42% 
to 51%. Hver\(me was con- 
gratulating themselves on the 
great success of education in 



raising the performance of 
students until our secretary 
of state revealed a report 
which suggests that there 
were a lot of fallibilities in 
human judgment that had 
gone int() the assessment 
I scoring! and that it was a 
phenomenon well-known \o 
vou (Americans), grade in- 
tlation. rather than a real im- 
provement of standards. Not 
ever\-one agrees with the Edu- 
c,iiionSecretar\',lohn Patten, 
but he proceeded to tighten 
the system/' 

In response to such problems, 
Nuttall said the United Kingdom 
turned to moderation to verity 
performance assessment scoring 
methods. According to Nuttall: 

^Wloderation is the basic qual- 
itv assurance mechanism we 
use to make sure that assess- 
ments not only meet the na- 
tional content and perf<ir- 
mance standards, but also 
meet requirements of vahd- 
it\\ reliabilit\\ and equirk*."*' 

The United Kingdom now uses 
teachers and curriculum consult- 
ants as moderators for their na- 
tional assessments. Regularly 
meeting and reviewing student 
work, teachers reach consensus 
on questions of task comparabil 
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it\\ standards, and scoring proce- 
dures. In cases where the work 
cannot be brought to a central 
site, the teachers visit individual 
schools to confer and evaluate 
student work and attempt to 
moderate student pertormanceb 
against the national ^tandardb. In 
addition to increasing ihc reli- 
abiIit\*ot'thc assessments, the pro- 
cess itself has had a ver\' positive 
effect on teachers. 

"The model of ever\* teacher 
as a moderator." said Niiiiall. 
*Ss a powcrtiil device mr pro- 
fessional development of 
teachers — giving them access 
to different ways of both as 
sessing students and ot set 
ting suitable activities for stu • 
dents, preparing them for as- 
sessment." 

Xuttall noted that this method 
may not meet the traditional tech- 
nical criteria of validity' and reli- 
abilit\% but that the inclusion of 
teachers in the moderation pro- 
cess has brought important gains 
in supporting the comparability- 
of assessment and scoring and in 
enriching teachers' professional 
skills. 

At least one otherCRESSTprc- 
senter reported similar benefits of 
getting teachers deeply involved 
in the entire assessment formula. 



lav McTighe. Man'land .Assess- 
ment Consortium, said: 

''Cktting teachers involved 
early on in the guts of [Mar\'- 
land assessment] develop- 
ment was very relevant to the 
process. We ha\*e learned that 
working with others to de- 
velop performance assess- 
ment and scoring instruments 
is one of the most powerful 
forms of professional devel- 
opment possible." 

Ultimately what researchers and 
others learn from the develop- 
ment of performance assessments 
within the United Statesand from 
other countries will address the 
technical validtt\' criteria of per- 
formance assessment tasks. 



What Wc Know About 
Performance Assessment 
Costs and Resources 

The final CRESST validit\' cri- 
terion focuses on the costs of per- 
formance assessment and the re- 
sources needed to implement new 
.assessments in the classroom. Al- 
though there are few details about 
specific costs and resource require- 
ments associated with perfor- 
mance assessments, anecdotal c\i - 
dencc indicates that such assess- 



ments are expensive. timc-con~ 
suming, and resource-intensive. 

Resource Needs 

CRESST presenters uniformly 
agreed that developing and imple- 
menting classroom performance 
.assessments places a tremendous 
burden on teachers. Teachers need 
extra time and extensive profcs- 
sional development, hoxh ofwhich 
are usually lacking in most schools 
today, if they are to become in- 
volved and committed to the as- 
sessment reform process. For ex- 
ample, CRESST researcher Char- 
lotte Higuchi from Farmdale El- 
ementary' School suggested dur- 
ing her presentation: 

''We need time at the class- 
room level! Two weeks be- 
fore school, to think, to plan , 
to write, to learn, to inno- 
vate, to design [performance 
assessments]."' 

Higuchi added that perfor- 
mance assessments require extra 
time to conference with children 
and parents, to re\ic vv anecdotal 
notes on students, and especially 
''time for teachers to think." 

Professional development is 
another agreed-upon resource 
prerequisite. CRESST research- 
ers Marv'l Gearhart and Shelby 
Wolf, for example, found that 
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teachers involved in a portfolio 
program required siibstaniiaL 
continuous professional develop- 
ment u) help them implement 
standards for student writinii. 
Finding that teachers did not criti 
cally assess then* students' ponW) 
lio work and did not fullv under- 
stand what constitutes qualitv 
writing standards, Wolt conducted 
a scries of workshops to help teach- 
ers discern elements of good writ 
ing and then built an assessment 
rubric founded on these elements. 
Ciearhart stressed that teachers 
must have substantial knowledge 
of a subject before they can be 
expected to be good assessors of 
it. She noted, however, that few 
schools or districts are able to 
fund this t\ pe of comprehensive 
development program. 

Opportunity Costs 

There are other expenses asso 
ciated with performance assess- 
ments, including opportunitx 
costs. Within a fixed school da\ . 
any added program means the 
loss of something else. Add port- 
folios and you might have to lose 
computer training. Vermont, for 
example, implemented portfolios 
as a statewide assessment during 
the 1991-92 school year, focus- 
ing on mathematics and writing 
skills. According to a CRESST 
evaluation, in order to implement 



portfolio programs in these two 
topics, N'ermont teachers cut back 
t'leir teaching of other subjects. 

"Performance assessment 
c(ists," said CRESST re- 
searcher and R\Nn social 
scientist Daniel Koretz who 
evaluated the Vermont pro- 
gram. "And the tinancial costs 
1 think are not the largest," 
he added, "there is a cost in 
[loss ofl content C(werage. 
What will happen when thev 
I Vermont teachers 1 have four 
I portfolio 1 subjects t\vo years 
down the road, 1 don't 
know.'' 

Solutions for Hifih Costs and 
Resource Needs 

CRESST researcher Lorric 
Shepard from the University of 
(Colorado, Houlder, suggested an 
alternative for classroom teachers 
who don't ha\ e the time to de- 
velop performance assessmentson 
their own but w ho are dissatisfied 
with assessments imposed by oth- 
ers. 

'\Steal halfof the things men- 
tioned at this conference and 
look at them in detail," urged 
Shepard to teachers attend 
ing the conference. Many 
examples not used for secure 
assessments are in the public 



domain. Look at the tasks 
and start collecting dift'ercnt 
samples ot pertormance as- 
sessment. See w hat they look 
like and what the scoring cri- 
teria are that make sense." 

Shepard recommended that af- 
ter teachers collect enough tasks, 
they should adapt these perfor- 
mance assessments to their own 
schools and classrooms and then 
learn to develop their own. 

Presenter Thomas Payzant, 
nominee for Assistant Secrctar\' 
for Hlementar\' and Secondare' 
Education and former Superin- 
tendent of San Diego City 
Schools, w as philosophical about 
the various time and cost burdens 
discussed by others. 

^Before w e become too hard 
on assessment ," said Pa\'zant, 
"remember that if we do cur- 
riculum development, it's 
hard and takes a lot of encrg\\ 
Ifwe really focus on teaching 
strategies and how kids learn, 
it's hard, time consumingand 
costly, and takes a lot of cn- 
ergv. So wh\' should we ex- 
pect the assessment effort to 
be any different? Perhaps we 
can gel some economies of 
encrg\' and scale by doing the 
three simultaneously," rcc- 
ommcnc'ed Pavzant. 
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Dan Rcsnick who is dircctini^ 
assessment development for the 
New Standards Project has tbiind 
that the most etficient way to 
develop performance assessments 
is through a partnership of orga- 
nizations who share costs. Seven- 
teen states and six urban districts 
ha\e joined the New Standards 
Project, the largest single group 
of states and districts across the 
countn' developing performance 
assessments matched to specific 
standards. 

Meanwhile, as already men- 
tioned, Marx'land has successfully 
pooled its statewide resources and 
created an assessment consortium 
involving all rvvent>-four school 
distncts in the state. Cireative as- 
sessments will require creative 
solutions to solve significant time 
and resource constraints. 

Worthwhile Results With 
Resources In Place 

The good news is that when 
adequate resources are in place, 
assessment reform appearsto hap- 
pen. Presenter Elizabeth Rogers 
from Charlottesnlle Cit)' (VA) 
Public Schools noted that once 
her school district met teachers' 
needs— in terms of professional 
development, dissemination of 
research, and instructional sup- 
port — change occurred. 



'' Teachers began to invent, 
to talk to other teachers, to 
read research," said Rogers, 
'\ind they created sophisti- 
cated (performance] meth- 
ods for assessing students." 

The admonition is that anyone 
who thinks that assessment re- 
form can occur without additional 
resources is likely to tind that few 
classroom changes are actually 
implemented. But when resource 
needs are foreseen and met, and 
teachers are part of the entire 
process, change is not only pos- 
sible, but significant. 



What Else We Know About 
Performance Assessment 

What we know is that lots of 
performance assessments are be- 
ing developed. CRHSST present- 
ers shared their ertorts in the areas 
of mathematics, science, literacy, 
social studies, workforce readi- 
ness, and several othertopics, such 
asportfoliosandmultidisciplinan' 
assessment. The following com- 
ments from various presenters 
contributed to the growing 
knowledge of ''what works" in 
performance assessment. 



Effects of Performance Assessment 

• One positive effect ofper- 
formance assessment is 
that learning and assess- 
ment is now a continu- 
ous process: Students 
have to re-do | perfor- 
mance I tasks. 

Ml- lady L 'U u, 
I. it til' ton Hinh School, Denver 
Pciformancc Assessments in Science 

• On balance, people [Ver- 
mont teachers and prin- 
cipals) were positive 
about the impact on in- 
struction. Some people 
were practically euphoric. 
Teachers who ever\'one 
thought were the least 
likely to change their in- 
struction were, in fact, fi- 
nally changing. 

Daniel Koretz. 
i RESST/llh RASP Corporation 
Models tar ijillaborative 
Assessment Developnu nt 

• Performance assessment 

does not take away from 

instruction but instead of 

fers students another op- 

portunit)' to learn. 

Gail Baxter, 
I niversity of Michinan 
Perfornianee 
A ssessm ents in Seien ce 
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Teachers describe signifi- 
cant shifts in their instruc - 
tional and assessment prac- 
tices. To a large extent they 
cannot and do not sepa- 
rate the nvo. 

Elizabeth Roncrs . 
Omrlottcsvillc On- Public Schools 
Performance Assesstnents iu Literacy 

In the national evaluation 
of performance assess- 
ments, teachers reported 
the (scoring) agreement 
that they were able to reach 
as one of the most impor- 
tant and useful parts of the 



'» Assessment does drive 
school instruction. As 
one teacher involved in 
piloting a science task re- 
marked , "If this is the way 
you are testing, then this 
is the way we are going to 
teach. This will really im- 
pact what we do in our 
classroom.'' 

Katijy Comfort, 
California Department 
of Education 
Pnformance Assessments in Science 

Many Remaining Challenges 



whole performance assess- 
ment paraphernalia. The 
process also helped to in 
sure more uniform a^isess 

ment among teachers. 

DcsmonH Nuttall , 
Vniversitx ofl/indon 
Lessons hroni Performance 
Assessment tn the United Kingdom 

Authentic assessment in 
social studies seems to lead 
students to become in- 
volved in a topic, provid- 
ing students a deeper un- 
derstanding of social is- 
sues, and a greater com- 
prehension of the inter- 
connectionsof specific his- 
torical periods. 

Cns Gutierrez . 
Jefferson Hijjh School, Los Antjeles 
Perfonn a ncc Assessm en ts iu 
Social Studies 



• Unfortunately, the per- 
formances on the open- 
ended items (perfor- 
mance assessments in sci 
encc] have ver\* thin re- 
sults indeed. This recur- 
rent problem may be at- 
tributable to students' 
inexperience with perfor- 
mance assessments or 
(the possibility) that 
many students just aren't 

\ ers' good writers. 

lyarrell Bock, 
CR ESST/Un i rerst ty of Chi cajjo 
Science Performance Assessments 

• We are a lot less fijrther 
along about deliver}' stan - 
dards than we are about 
content and performance 
standards. Deliver)' stan- 



dards are criteria to judge 
whetherstatcs, school sys- 
tems, schools, and class- 
rooms are providing an 
educauon that will en- 
able students to achieve 
those [content and per- 
formance] standards. In 
essence, the burden 
should be on the system, 
not on the students, for 
educating children. Oth- 
erwise, performance stan- 
dards are not fair. 

Hilda Borko. 
CRESST/University of Colorado 
Service Delivery Standards 

• In scoring our field tests 
in reading, mathematics, 
and social studies, wc 
found that students had 
difficult)' writing about 
specific content areas. 
They could often arrive 
at an answer but be un- 
able to explain how they 
arrived at the answer or 
why it was the correct 
one. They are not accus- 
tomed to justifying or ex- 
plaining — they arc accus- 
tomed to recall and for- 
mulas. 

Daisy Vickers, 
North Carolina State Department 
of Instruction 
Muitidiscipltnary Assessments 

• Findings from the QUA- 
SAR project indicate that 
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Portfolio Assessment 
Videotape! 



student performance 
across different math- 
ematics tasks is inconsis- 
tent. Clonseqiiently, if 
student -level scores are 
of interest, more than nine 
tasks may be 'required to 
obtain reliable results of 

student performance, 

Suzanuc luinc. 
I niviTSttx ofPittsbnrah 
Pcrf'onnnncc Assessment tu 
Mathemnttcs 

Findings About Prompts and 
Scoring 

• One of the tricks in 
multidisciplinan' assess- 
ment is that you can give 
students a single task, but 
use different kinds of ru- 
brics for scoring, depend- 
ing on what you are inter- 
ested in measuring. For 
example, we asked stu- 
dents in the Humanitas 
multidisciplinan' program 
to w rite essays on a topic 
integrating their knowl- 
edge of historv', literature, 
and the arts, and then we 
scored the essa\s accord- 
ing to both writing qual- 
ity and subject matter un- 
derstanding. 

rnwiln Asthhncht'r 
Unlndtsctphuivy Assts<meuts 



• i\lany tests used for ac- 
countability, such as 
NAEP, have high stakes 
for administrators and no 
stakes for students. A key 
question is how to moti- 
vate students when the 
test does not count. Our 
research indicates that fi- 
nancial incentives increase 
test performance butnon- 
fmancial incentives do 
not. These findings were 
particularly true for 
eighth graders on easier 
NAEP items. 

Harold (VSetL jr. 
CRESST/ 

I 'mvrntn of Soittlwm ( 'nlithnnn 
SAhP Mottvnttnn SntHv 



In Conclusion 

The 1 992 annual C IRHSST con - 
ference synthesized much of the 
current knowledge of what re- 
searchers, teachers, and assess- 
ment policymakers believe 
''works'' in performance assess- 
ment. Although much has been 
discovered in the last few years, 
the current performance assess- 
ment movement is still in its in- 
fancy, and the demand for knowl- 
edge about what "works" in per- 
formance assessment will likely 
exceed the supply for some time 
to come. 



J tun 'Ju' CRESST research 
staff, including Eva Baker and 
Mar\i Gearhart, in "Ponfo- 
lio Assessment and High Tech- 
nology. " This 10-minutc produc- 
tion, made in 1992, examines 
key issues of portfolio assessment 
including: 

• Student use of portfolios 
in the classroom; 

• Selecting students' best 
pieces of classroom work; 

• Involvement of parents 
in the portfolio process; 

• Use of technology- to pro- 
mote good writing; 

• Electronic student port- 
folios. 



This videotape will be useful to 
school districts, principals and 
teachers interested in building 
their own portfolio programs, as 
well as researchers who want more 
information about the latest 
C^RESST research programs. 

The cost of "Portfolio Assess- 
ment and high Technology" is 
SI 0.00 and mav be ordered on 
page 24. mm 
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NEW! 

Performance- Based Assessment 
and What Teachers Need 

Charlotte Hiffuchi 

CSE Technical Report 362, 1993 

(54.00) 

Arguing that erVcctive imple- 
mentation of performance assess- 
ments in the classroom requires 
systemic reform of the teaching 
profession and of the school sys- 
tems in which they work, teacher 
Charlotte Higuchi discusses the 
criteria that will result in improved 
classroom assessment. She sug- 
gests that alternative assessments 
oft'cr teachers a critical tool for 
understanding their children. 

"Multiple-choice tests eliminate 
teacher judgment in the assess- 
ment prcKcss/' says Higuchi , "and 
are frequently not aligned with 
the instructional program . In con 
trast, performance -based assess- 
ments arc individual or collective 
teacher judgments. They give rich, 
detailed information as to what 
students can and cannot do, and 
therefore enable teachers to plan 
instruction based on student 
needs. 

To help teachers develop and 
implement their own performance 
assessments and to become 



teacher- researchers, Higuchi 
urges school districts to provide 
the following minimum resources: 

• Time to think, to learn, 
to write, to collaborate, 
to analyze, to plan, and 

create new forms of 
assessment; 

• Work space including 
dcsks,chairs, tile cabinets, 
and storage cabinets for 
equipment; 

• Computers and printers, 
a phone, and a fax; 

• Duplicating senices ' ^ 
copy student work for 
portfolios and 
assessment records; 

• C'lerical support to t\'pe 
correspondence, order 
materials, and maintain 
records; 

• An onsite librar\- with 
journals from professional 
organizations, the latest 
books on education, and 
a media center. 



"Full implementation of perfor- 
mance-based assessments/' says 
Higuchi, "demands that teachers 
constantly discuss student peribr- 
mance,siandardsofperformance, 
and how to change the instruc- 
tional program to improve thai 
performance,'' 

STILL NEW! 

Sampling Variability of Perfor- 
mance Assessments 

R icha rd Shci velson, Xiaohoupf Gao 

and iiail Baxter 

C:SH Technical Report 361, 1993 

(S4,00) 

The authors of this study exam 
ined the cause of measurement 
error in a number of science per- 
formance assessments. In one pait 
of the study, 186 fifth- and sixth- 
grade students completed each of 
three science tasks: an experiment 
to measure the absorbency of pa- 
per towels; a task that measured 
students' ability' to discover the 
electrical contents of a black mys* 
ter\' box; and a task requiring 
students to determine sow bugs' 
preferences for various en\iron- 
ments (damp vs. dry, light vs. 
dark). 

The researchers found that the 
measurement error was largely due 
to task sampling variability. In 
essence, student performance var- 
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icd signiticantly from one task 
sample to another. 

Based on their study of both 
science and mathematics perfor- 
mance assessments, the authors 
concluded that "regardless of the 
subject matter (mathematics or 
science), domain (education or 
job performance) or the level of 
analysis (individual or school), 
large numbers of tasks are needed 
to get a generalizable [depend- 
able] measure of performance/' 

In another p.Tt of the study, the 
researchers evaluated the methods 
in which students were assessed 
on several of the same experi- 
ments including: 

• a notebook — a method 
in which students con- 
ducted the experiment, 
then described in a note- 
book the procedures they 
followed and their con- 
clusions; 

• computer simulations of 
the tasks; and 

• short -answer problems 
where students answered 
questions dealing with 
planning,analy/ingorin' 
tequcting the tasks. 

The notebook and direct ob 
servations were the onlv methods 
that appeared to be fairlv inter 



changeable. The results from both 
the short-answer problems and 
the computer simulations were 
0 appointing. 

hicreasing the number of tasks 
iscostly and time consuming, con- 
clude the authors. But they warn 
that tr\'ing to explain away tech- 
nical problems is dangerous. 

CRESST Performance Assess- 
ment Models: Assessing Con- 
tent Area Explanations 

Eva Raker, Pamela Aschbachcr, 
David Ntemi. and Edynn Sato, 
1992 (SIO.OO) 

This assessment model, based 
on a highly contextualized his- 
tors' performance task, requires 
students to engage in a sequence 
of assessed steps, including an ini - 
tial evaluation of their relevant 
background knowledge of the 
particular historical period. Stu- 
dents write an extended essay that 
explains the positions of the au- 
thors of the original text materi- 
als, such as the Lincoln-Douglas 
debates, and draw upon their own 
background knowledge for ex- 
planation. The essay scoring ru- 
bric consists of six dimensions: a 
Ciencral Impression of eContent 
Quality scale, and five analytic 
subscales. 

Included in the handbook arc: 
background information on the 



C^RESST performance -based as- 
sessment, examples of assessments 
for secondan'-levcl historv' and 
chemistrs', and specifications for 
duplicating the technique with 
other topics and subject matter 
areas. The rater training process, 
scoring techniques, and methods 
for reporting results are described 
in detail. 

Raising the Stakes of Test Ad- 
ministration: The Impact on 
Student Performance on N AEP 

Vouda L. Kiplinjfcyand Robert L 
Linn 

CSETechnical Report 360,1993 
(S4.()()) 

The National Assessment of 
Educational Progress ( NAEP) test 
has been accused of under- 
estimating student achievement 
because this "low -stakes" assess- 
ment has no consequences for 
students, their teachers, or their 
schools. In contrast, "high- 
stakes'' tests — those assessments 
that have serious consequences 
for students, teachers, and 
schools — are assumed to moti- 
vate greater student performance 
because of the positive or nega- 
tive consequences (such as col- 
lege entrance) associated with 
student performance on the test. 

The purpose of this study was 
to investigate whether differences 
in test administration conditions 
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and presumed lc\cls ot motiva 
lion created by the different test 
nii» enxironmenis atVect student 
performance on tlie NAHT test. 
The testing conditions studied 
were the 'Mow-stakes'* environ- 
ment of the current NAKT ad 
ministration and a "higher-stakes" 
en\'ironment typiticd h\ man\ 
state assessment programs. 

The results ot'tlie study lead to 
the conclusion thai estimates ot 
achievement from NAKT would 
not be substantially higher it the 
stakes were increased to the le\ el 
associated witii a "higher- stakes" 
test. 

Issues in Innovative Assessment 
for Classroom Practice: Barri- 
ers and Facilitators 

Pamela Aschbachcv 

CSE Technical Report 359, 1993 

(S4,50) 

As proven by the British experi 
ence, we cannot assume that new 
innovative assessments will be 
immediately understood and em 
braced by American teachers 
Implementing performance as 
scssments may demand new roles 
for teachers and students and re- 
quire a radical paradigm shift 
among educators — from a focus 
on content coverage to outcomes 
achieved. 

This paper, utilizing an action 
research approach, describes the 



fmdings of (IRESS'l researchers 
who obserwd, inter\*iewed, aiul 
surveyed teachers mvolved m 
implementing alternative assess 
mcnts into their classrooms. Prob 
ably the most fundamental bar- 
riertodevelopingandmiplement 
mg sound performance assess- 
ments was the pen asive lendency 
of teachers to think about class- 
room activities rather than stu- 
dent outcomes. Teachers who 
used portfolios, for example, to 
cused on w hat interesting activi 
ties might be documented in the 
portfolios rather than what goals 
would be achieved as a result ot 
these instructional activities. 

The study rex ealed other basic 
barriers in the development and 
implemeniation of alternative as- 
sessments, including teacher a*« 
sessment anxiety, lack of teacher 
time and training, and teachers' 
reluctance to change. 

Writing What You Read: As- 
sessment as a Learning Event 

Shelby Wolf and Maryl Gcarhan 
CSHTechnical Report 358, 1993 
(S4.()0) 

This rep(jrt focuses on the ccn 
tral role of teachers' interprcta 
tive assessments in guiding the 
growth of young writers. The 
teacher sen*es as critical reader 
and responder, providing com 
mendations and recommenda 



lions tor turiher growth. But the 
teacher is not the only expert. 
Students loo are encouraged t(» 
participate in assessment dia 
logues, reflecting, analyzing, and 
contribuiing to their growth. 

The authors of this report pro 
p( )se a new scheme to guide teach 
ers' and students' reflection. Ho 
cusingon narrative criticism and 
composition, the scheme is based 
on eight components ot narra 
live: i^enie, theme, characters, 
setting, plot, point of view , style, 
and tone. 

Omitted and Not- Reached 
Items in Mathematics in the 
1990 National Assessment of 
Educational Progress 

Dan ill Kontz. Elizabeth Lewis, 
'l orn Skeurs'Cow and Lcijjh Bur- 
stein 

CSK 'I echni^al Report 357, 1992 
(S4.0(). 

Non-response to test items on 
the National Assessment of Edu- 
cational iVot^ress has been a con 
cern tor some time, particularly in 
the case ot mathematics. Until 
recentlv, the primar\' concern has 
been *'not-reached" items — that 
IS, items not answered because 
the student failed to complete the 
test — as opposed to omitted (^r 
,>ped items. 

The study examined patterns of 
non -response in the three age/ 
grade groups (age 9/grade 4, age 
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13 /grade 8, and age 17/grade 
12) included in the 1990 assess- 
ment of mathematics. 

The results showed that overall 
omit rates were modest in grades 
4 and 8, and not-reached rates 
were greatly reduced from 1986 
levels. Differences in non- response 
between white and minority stu- 
dents were less severe than they 
first appeared when adjusted for 
apparent proficiency differences. 
Gender differences in omit rates 
were infrequent. 

Nonetheless, the results pro- 
vide grounds for concern. Omit 
rates were high for a subset of 
open-ended items, and the pro- 
portion of items with high omit 
rates in grade 12 was substantial. 
The omit-rate differentials be- 
tween white and minority stu- 
dents, especially for open-ended 
items, arc troubling and will likely 
become more so as the NAHP 
continues to increase its reliance 
on such items. Taken together, 
these results suggest the need for 
routine but focused monitoring 
and reporting of non- response 
patterns. 



Latent Variable Modeling of 
Growth With Missing Data and 
Multilevel Data 

Benjft Mtithen 

CSE Technical Report 556, 1 992 
($2.50) 

This paper describes three im- 
portant methods of multivariate 
analysis which are not always 
thought of in terms of latent vari- 
able constructs, but for which la- 
tent variable modeling can be used 
to great advantage. These meth- 
ods are; random coefficients de- 
scribing individual differences in 
growth; unobserved variables cor- 
responding to missing data; and 
variance components describing 
data from cluster sampling. The 
methods are illustrated using 
mathematics achievement data 
from the National Longitudinal 
Study of America Youth. 

The Reliability of Scores From 
the 1992 Vermont Portfolio 
Assessment Program 
Daniel Koretz, Brian Stcchcr, and 
Edward Dcibcrt 

CSE Technical Report 355, 1993 
(S3.00) 

A follow-up report to the same 
study (CSR Report 350), this 
report presents CRESST's find- 
ings about the reliability' of scores 
from the Vermont portfolio as- 



sessment program. In this com- 
ponent, the researchers focused 
not on the program's impact as an 
educational intervention, but 
rather on its quality as an assess- 
ment tool. 

The "rater reliability'" — that is, 
the extent of agreement between 
raters about the quality of stu- 
dents' portfolio work — was on 
average low in both mathematics 
and writing. However, reliability 
varied, depending on subject, 
grade level, and the particular scor- 
ing criterion, and in a few in- 
stances it could be characterized 
as moderate. The overall pattern 
was one of Unv reliability, how- 
ever, and in no instance was the 
scoring highly reliable. 

Although it may be unrealistic 
to expect the reliability of portfo- 
lio scores to reach the levels ob- 
tained in standardized perfor- 
mance assessments, the Vermont 
portfolio assessment reliability 
coefficients are low enough to 
li mi t seriously the uses of the 1 992 
assessment results. The report 
concludes with an analysis of is- 
sues that need to he considered in 
improving the technical quality of 
the assessment. 
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Assessment of Conative Con- 
structs tor Educational Re- 
search and Evaluation: A Cata- 
logue 

R ichavH Sftow and Dottjjias Jack- 
sott 

C:SK rcdinical Report 354, 1992 
(SH.OO) 

III recent years, an overahiin 
dance ot'psycliological constructs 
and their associated measures have 
been presented bv educational 
researchers and proi^rain evahia- 
tors. Amonix the most intercstinu 
and potentially iisctiil of these 
constructs are those retlcctuiu 
motivational and volitional as 
pects ot human behavior, c.illed 
"conative constructs . " Among the 
ccmstructs in this categon* arc. 
need tor achievement and tear ol' 
t'ailure, beliets about one's own 
abilities and their development, 
teelings ot selt' esteem .ind self 
etVicacv, attitudes about particu 
lar subject-matter learmnii, and 
manv others. 

1 his catalogue brings together 
in one place those con.itixe con 
structs that seem most promising 
as useful tor future research and 
evaluation work in education. For 
each catalogued construct, the 
authors provide a briet review 
covering construct definition, 
theoretical base, assessment pro 



cedures, references, and wiiere 
possible, study abstracts evaluat- 
ing assessment instrumentsoroth 
envisehearingonappropriatccon- 
struct wilidation. 

The Apple Classrooms of 
Tomorrow'^"!: The UCLA 
Evaluation Studies 

Eva L, Baker. MmyUkat'hat t.ntid 
joati L. Hcnnati 

C:SK Technical Report35,^, 1993 
($3.50) 

I'he Apple ('lassrooms ot 
' 1 ( )morr( m '''" ( A( X VI " i project was 
initiated in classrooms ,n five 
school sites m 19S5 as a program 
of rescirch on the impact of inter- 
active technologies on teaching 
and learning. While the project 
has expanded over time to en- 
compass a larger and more diverse 
set of efforts, key components at 
all sites were the pnn ision ot'high 
lechnologv dccess, site freedom 
tode\*elop technologv- supported 
curriculum and pedagogv as ap 
propnate to site goals, and the 
resulting study of what happens 
when technologx' support IS readily 
iivailable to students and teachers. 

Four basic questions guided the 
evaluation: 

1. What is the impact of 
AC'O Ton students: 

2. Wh.u IS the impact of 
A(!(Vron teachers' prac 
tices and classroom pro- 



cesses? 

3. Wl;at is the impact of 
ACOT on te.ichers pro- 
fessionally and personallvr 

4. What is the impact of 
ACXTf on parents and 
home life: 

This report suminari/.es find 
ings from 19S7 through 1990. 

Collaborative Group Versus 
Individual Assessment in Math - 
ematics: Group Processes and 
Outcomes 

Norcai Webb 

CSHTechnical Report 352, 1993 
(S4.00) 

This studv asked the question: 
" To what extent do scores on a 
group assessment actuallv repre 
sent individual performance or 
knowledge?" Researcher Norcen 
Webb ga\'e two seventh-grade 
classes an initial mathematics test 
as a group assessment, where ex- 
chcUige of inform.ition and assis 
tancc was coiumon . Se\'eral weeks 
later, she administered a nearly 
identical individual test tothe same 
students where assistance was not 
permitted. 

The results showed that some 
students' performance dropped 
significantly from the group as- 
sessment to the individual test. 
'Ihese students apparentlv de- 
pended on the resources of the 
group in order to get correct an- 
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swcrsand when the same resources 
were not available during the in- 
dividual test, many the stu- 
dents were not able to solve the 
problems. ''Scores from a group 
assessment," said Webb/'may not 
be valid indicators of some stu- 
dents' individual competence. 
Furthermore, achievement scores 
from group assessment contexts 
provide little information about 
group functioning/* 

Educational Assessment: Ex- 
panded Expectations and Chal- 
lenges 

Robert Linn 

CSE Technical Report 351 , 1992 
(S3,50) 

"Educational policymakers are 
keenly interested in educational 
assessment," says Robert L. Linn 
in his 1992 Thorndike Award 
address to the American Psycho- 
logical Association. Linn points 
to the various attractions that as- 
sessments have for policy makers 
who frequently think of assess- 
ment as a "kind of impartial ba- 
rometer of educational qualitv'.'" 
But assessments are frequently 
used for two questionable pur- 
poses, notes Linn, first, to point 
out the declining qualit\' of Ameri - 
can education and, secondly, as 
an instrument of educational re- 
form. "Such greatly expanded, 
and sometimes unrealistic, policy- 
maker expectations'' he says, "to- 
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gether with the current press for 
radical changes in the nature ot 
assessments, represent major chal- 
lenges for educational measure- 
ment." Linn concludes his re- 
marks by saying that the measure- 
ment research community' must 
make sure that the consequences 
for any new high -stakes perfor- 
mance assessment system are bet- 
ter investigated than they were 
for previous assessment reforms. 

The Vermont Portfolio Assess- 
ment Program: Interim Report 
on Implementation and Impact, 
1991-92 School Year 

Daniel KovctZs Bvinn Stcchn\ and 
Edward Dcibcvt 

CSE Technical Report 350, 1992 
(56.00) 

Vermont is the first state to 
make portfolios the backbone of a 
statewide assessment system. 
Daniel Koretz, Brian Stechcr, and 
Edward Deibert, the authors (^f 
this CRESST/RAND report, 
have been evaluating the V crmont 
portfolio program for almost wo 
years. The researchers found that 
support for the Vermont portfo- 
lio program, despite tremendous 
demands on teacher time js wide- 
spread. "Perhaps the most telling 
sign of support for the Vermont 
portfolio program,'' write the au- 
thors, that I even in the pilot 
year] the porttblio program had 
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already been extended beyond the 
grades targeted by the state." 

An interesting instructional 
phenomenon was that over 80% 
of the sun'cyed teachers in the 
N'ermont study indicated that they 
had changed their opinion of stu- 
dents' mathematical abilities based 
upon their students' portfolio 
work. In many cases, teachers 
noted that students did not per- 
form as well on the portfolio tasks 
as on previous classroom work. 
This finding, supported by other 
performance assessment research, 
suggests that portfolios may give 
teachers another assessment tool 
that appears to broaden their un- 
derstanding of student achieve- 
ment. 

Design Characteristics of Sci- 
ence Performance Assessments 

RobcnCrtascK Kalyani Rajihavan, 

and Gail Raxtcv 

CSE Technical Report 349, 1992 

(S3.00) 

Part of a long-range goal to 
investigate the validity of reason- 
ing and problem-solving assess- 
ment tasks in science, this report 
describes progress in analyzing 
several science performance as- 
sessment projects. The authors 
discuss developments from 
Connecticut's Conmion Ciore of 
Learning Assessment Project, the 
(California Assessment Program, 
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aiul the I nivcrsitv ot C^ilitorni.i. 
Snni.i Harbnn C'alitorni.i Institute 
of Tcclinoloev rcscarcli piDjcct 
''Alternative Tcchtioloixics tor 
Asscssint: Science L'ndersiatid 
inii." The aiialvsis trainework ar 
ticulates tieneral aspects (jt prob 
leiii'Solviiiiipertorniance. incliui 
ing structured, iiitetirated know l 
edge; ertective problem represcii 
ration; procediirali/ed know ledue; 
automaticitx"; and selt-regulaton 
skills. 

Accountability' and Alternative 

Assessment 

Jonti Hmuau 

CSETechnical Report 34S, 1992 
( S4,()() ) 

Despite growint: dissatistaci ion 
with traditional multiple choice 
tests, national and state educa- 
tional policies reflect continuinu 
belief m the power ot good assess - 
mcnt to encourage school ini 
provement/rhe underlying logic 
is strong. Ctood assessnient sets 
meaningful standards, and these 
standards provide diieciion lor 
instructional etVorts and models 
ot" good practice. But are these 
reasonable assumptions? How 
close are we to having the gooc! 
assessments that are required? 

This report summarizes the re 
search evidence supporting cur- 
rent beliefs in testing, identifies 
critical qualities that good assess 



ERIC 



meiit should exemplitv. and iv 
views the current state of the re 
search knowledge on how to pro 
diice such measures. 

Benchmarking Text Under- 
standing Systems to Human 
Performance: An Exploration 

h'rnticcs Butler. Era Hnkcj\ Due 
Falh, Hinvard HcrL Toitunchcr 
lafiji. auii Patricia Miitcl) 
CSETechnical Report 347,1991 
(S5.()0i 

Benchmarking in the context 
of this report means comparing 
the pertormance of intelligent 
computer svstems to the pertbr 
mance of humans on the same 
task. The results of this report 
support the belief that we can 
compare system pertormance to 
huutau performance in a me.ir 
ingful way usmg performance 
based measures. This sttidv pro- 
vides direction t( )r researchers who 
are interested in a meihodologv 
tor assessing intelligent computer 
systems. 

More Reports 

HorahstotoverlSOCRHSSr/ 
C:SK technical reports, mono 
graphs and products, please write 
to: C:RHSST/UC:LA, Craduate 
Sclu.ol of Education, 405 Hil 
gard Avenue, Los Angeles. C'A 
90024 1S22. Or call Kim Hurst 
at (310) 2()6T532. m 
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. ^! : tonus tor the 

1993 CTU:SST conference 
will be on their way soon. Tlease 
save the dates September 13-14. 
1993 on your calendar for this 
special event! 
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